VLA-SMILES: Variable-Length-Array SMILES Descriptors in Neural Network-Based QSAR Modeling
نویسندگان
چکیده
Machine learning represents a milestone in data-driven research, including material informatics, robotics, and computer-aided drug discovery. With the continuously growing virtual synthetically available chemical space, efficient robust quantitative structure–activity relationship (QSAR) methods are required to uncover molecules with desired properties. Herein, we propose variable-length-array SMILES-based (VLA-SMILES) structural descriptors that expand conventional SMILES widely used machine learning. This representation extends family of numerically coded SMILES, particularly binary expedite discovery new deep QSAR models high predictive ability. VLA-SMILES were shown speed up training based on multilayer perceptron (MLP) optimized backpropagation (ATransformedBP), resilient propagation (iRPROP‒), Adam optimization algorithms featuring rational train–test splitting, while improving ability toward more compute-intensive format. All tested MLPs under same length-array-based showed similar convergence rate combination considered procedures. Validation Kennard–Stone splitting descriptor similarity metrics was found effective than partitioning ranking by activity biological values for entire set featured QSAR. Robustness MLP assessed via method parametric model validation. In addition, statistical H0 hypothesis testing linear regression between real observed activities F2,n−2 -criteria predictability estimation among QSAR-MLPs (with n being volume set). Both approaches validation correlate when evaluation predictabilities designed descriptors.
منابع مشابه
Smiles Counting the Smiles
This research summary, based on a review of the literature and a series of focus groups with NHS staff, shows that while it is difficult to measure how people feel about their work, there is much to suggest that morale and motivation of the NHS workforce are low. It identifies three key factors that affect morale and motivation: whether staff feel valued, their working environment, and resource...
متن کاملSMILES Enumeration as Data Augmentation for Neural Network Modeling of Molecules
Simplified Molecular Input Line Entry System (SMILES) is a single line text representation of a unique molecule. One molecule can however have multiple SMILES strings, which is a reason that canonical SMILES have been defined, which ensures a one to one correspondence between SMILES string and molecule. Here the fact that multiple SMILES represent the same molecule is explored as a technique fo...
متن کاملFrequent SMILES
Predictive graph mining approaches in chemical databases are extremely popular and effective. Most of these approaches first extract frequent sub-graphs and then use them as features to build predictive models. In the work presented here, the approach taken is similar. However, instead of frequent sub-graphs, frequent trees, based on SMILES strings are derived. For this, the SMILES strings of c...
متن کاملSMILES. 2. Algorithm for generation of unique SMILES notation
(24) Ritter, G. L.; Isenhour, T. L. Minimal Spanning Tree Clustering of Gas Chromatographic Liquid Phases. Comput. Chem. 1977, 1, 145-153. Everitt, B. Cluster Analysis; Halsted: New York, 1974. Balaban, A. T. Chemical Graphs. XXXIV. Five New Topological Indices for the Branching of Tree-Like Graphs. Theor. Chim. Acra 1979, 53, 355-375. Balaban, A. T.; Motoc, I. Chemical Graphs. XXXVI. Correlati...
متن کاملSmil in X-smiles
World Wide Web Consortium has specified Synchronized Multimedia Language (SMIL), which is intended to bring multimedia into World Wide Web. The most popular browsers still don’t support SMIL in its full, which slows down the utilization of SMIL. In this paper, we describe our implementation of a SMIL 1.0 player. The player is part of our X-Smiles browser, which is an open source XML browser.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Machine learning and knowledge extraction
سال: 2022
ISSN: ['2504-4990']
DOI: https://doi.org/10.3390/make4030034